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ABSTRACT 

A Bayesian approach to the determination of stellar distances from photometric and spec- 
troscopic data is presented and tested both on pseudodata, designed to mimic data for stars 
observed by the RAVE survey, and on the real stars from the Geneva-Copenhagen survey. It 
is argued that this method is optimal in the sense that it brings to bear all available informa- 
tion and that its results are limited only by observational errors and the underlying physics 
of stars. The method simultaneously returns the metallicities, ages and masses of programme 
stars. Remarkably, the uncertainty in the output metallicity is typically 44 per cent smaller 
than the uncertainty in the input metallicity. 

Key words: star: distances - methods: numerical - methods: statistical - techniques: photo- 
metric - techniques: spectroscopic - astrometry 



1 INTRODUCTION 

Present attempts to reconcile models of the Galaxy with observa- 
tions are hampered by a lack of full 6d phase space data for a large 
number of stars. While the majority of recent stella r surveys - for 
example the Geneva-Copenhagen survey ( Nordstrom et al.l 2004h 
and the Radial Velocity Experiment (RAVE; ISteinmetj|2003h - 
provide sky positions, proper motions and often radial velocities, 
the persistent difficulty is the determination of distances, without 
which proper-motion measurements are of limited value. Since the 
fitting of models to observational constraints is arguably the method 
with the most potential for understanding the true nature of the 
Galaxy, the lack of these distance data is a key obstacle to our un- 
derstanding of the Milky Way, and thus to our understanding of 
galaxies in general. 

Several methods of distance determination are known. Per- 
haps most famously and successfully, the Hipparcos satellite mea- 
sured trigonometric parallaxe s for ~ 10^ s tars down to a V-band 
magnitude of around 12 ( Per rvmaril 1 1 9971) . Trigonometric paral- 
lax is conceptually the most fundamental distance measurement 
technique for stars; however the Hipparcos measurements were ac- 
curate only out to a dist ance of around 200 pc. Within the next 
decade, the Gaia mission ( |Perrvmanl2005h should return parallaxes 
for around 10^ stars, dramatically increasing the size of the avail- 
able data set for investigation; however in the meantime trigono- 
metric parallaxes to a vast number of otherwise well-observed stars 
are lacking. Furthermore, even after the Gaia mission is complete, 
other estimation methods will remain vital for more distant stars. 

Two other trigonometric methods of distance measurement are 
known: so-called 'Galactic parallax ' tevre & Binnevll2009l; lEvrel 
200S ) and 'geometrodynamical' techniques ( I Jin & Lvnden-Bell 
20081) . However, both these methods require strong constraints on 
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the orbits of target stars, so they are restricted to a small subset of 
all stars. 

The next most reliable distance determination technique is the 
use of so-called 'standard candles' - stars with particular properties 
such as RR Lyrae and Cepheid variables, whose luminosities are a 
reasonably sharply-defined function of observables such as period 
and colour While the accuracies attainable for such stars may be 
high, standard candles form an extremely small subset of any stel- 
lar population. Hence although they can be used to give a broad- 
brush picture of Galactic structure (see Gautschv & Saio 1996), the 
technique cannot be applied to the majority of stars of interest. 

The other major technique for distance estimation is the deter- 
mination of 'photometric distances'. This involves deducing the lu- 
minosity of a star from its colours and perhaps metallicit y, allowing 
its distance to be inferred from its apparent magnitude, ljuric et afl 
(2008) have applied this technique to millions of stars measured 
by the Sloan Digital Sky Survey (hereafter SDSS). The technique 
works well for SDSS stars because they are apparently faint and 
therefore overwhelmingly likely to lie on the main sequence, and 
the luminosity of such a star is a well-defined function of colour 
and metallicity. However for samples at brighter apparent magni- 
tudes it is not safe to assume that stars lie on the main sequence, 
and a more sophisticated technique is required. 

An example of such a sample is the RAVE survey, which is in 
many ways complementary to the SDSS sample. It covers a nom- 
inal 7-band magnitude range of 9 < / < 12 and is expected to 
provide spectrophotom etric data on up to a million stars by 2011 
jSteinmetz et alj|200^ . On account of its magnitude range, the 
RAVE sample contains a non-negligible number of giants. Such 
a heterogeneous sample therefore requires a more sophisticated ap- 
proach to parameter determination than a direct colour-to-absolute 
magnitude mapping. Furthermore it would be much more satisfac- 
tory to obtain a methodology that can give a definitive distance esti- 
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mation (and corresponding individual uncertainty) for specific stars 
rather than onl y for sizeable s t atistica l samples. 

Recently iBreddels et al estimated photometric dis- 

tances to ~ 25 000 stars in the RAVE survey. These distances were 
obtained by repeatedly seeking the stellar model that provides the 
best fit to first the data and then 5 000 pseudo-data obtained by 
scattering the observables of the first stellar model by the observa- 
tional errors. The logical justification of this procedure is unclear. 
In this paper we argue that the principles of Bayesian inference 
lead unambiguously to a different procedure that has much in com- 
m on with the procedu res for determining stellar ages in troduced 
bv lPont & Evej ( l2004h and |j0rgensen & Lindegreiil ( l2005h . In fact, 
our procedure yields not only a distance to each star but also esti- 
mates of its mass, age and metallicity. We argue that our procedure 
is optimal in the sense that it exploits the available data in their en- 
tirety - including information about the survey in question. From 
these facts it constructs a pdf for each star in model space, so es- 
timated distances, masses, ages and metallicities are accompanied 
by error estimates. The method is applicable to any survey that pro- 
vides more than one non-degenerate observable for each star and is 
limited only by degeneracies in the underlying stellar physics and 
any inherent inaccuracies in the stellar models. 

The paper is structured as follows: Section |2] covers the the- 
oretical basis of the method. Section [3] explores the application of 
the method to a fake data set. Section |4] then looks at the results 
of its application to the Geneva-Copenhagen sample. Section|5]de- 
tails the relationship of this paper to previous. Section |6] presents 
a discussion of the results and highlights the potential for future 
applicability. 



2 THEORY 

Let y represent an n-tuple of a star's observable quantities (e.g. ef- 
fective temperature Teg, surface gravity log g, observed metallicity 
[M/H]oj3j,, colours, apparent magnitudes, sky position) and let x 
represent its six 'intrinsic' variables (true initial metallicity [M/H], 
age r, initial mass A4, distance s, and true sky position {I, b)) - i.e. 



X = {[M/li],T,M,s,l,b) . 



(1) 



We will assume that a stellar model can be used to provide a direct 
mapping from x-space to y-space. For the purpose of concision, let 
G(w, cr) represent the multivariate Gaussian function 



G(w, ''■) = ((Tj\/27r) ^ exp (^-w'f /2a^^ 



(2) 



for an n-tuple w. 

In order to estimate a value and an uncertainty for each stellar 
parameter, the natural approach to take is probabilistic. We know a 
set of facts: the actual measured values of a star's observables (y), 
the quoted errors thereon (<Ty) and, subtly, the fact that the star is in 
the given sample (let this fact be denoted by S). Hence the logical 
distribution to consider is the pdf of a star's parameters Xi given 
these three facts. If we can find this distribution, then we have all 
the parametric information that can logically be inferred from the 
known facts - most importantly, we can give an expectation value 
for each parameter and an estimated error on this value. 

Explicitly, for each component Xi of x we seek the three mo- 
ments lit defined by 



where k G {0, 1, 2}. (The k = moment should properly be unity, 
but it is useful if the pdf is left unnormalized.) From these moments 
we can then infer the expectation and variance of each stellar pa- 
rameter. 

In order to express the pdf in terms of known distributions, it 
is useful to consider the full pdf 

p{x,y,cry,S) = p{x\y,ay,S)p{y,(Ty,S) (4) 
= P{S\y, X, o-y) p(y|x, (Ty) p{ay\x.) p(x). 

Hence we can expand the pdf in a form analogous to Bayes' theo- 
rem to give 

^ m -P(5'|y, X, <tJ p(y|x, (Ty) p{cry\x) p(x) 
Pl^ly, CTy, = — — , (S) 



which can be simplified to 



p{'y^\y,ery,S) oc P{S\y,x,(7y)p{y\x,(Ty)p{cry\x.)p{x.), 



(6) 



wherein factors independent of x have been neglected as irrelevant 
to the problem we aim to solve, since we will require only ratios of 
different X^fc's. 

Each of the factors in equation ^ can now be analysed inde- 
pendently: 

(i) P(S|y, X, <Tj,), which we term the selection function, ex- 
presses the probability of a star being in the sample given its inher- 
ent values and the measurement errors. This factor therefore reflects 
our selection criteria, whether they be, for instance, a magnitude cut 
or any cut on errors used to 'clean' a sample. 

(ii) For many types of observation the likelihood p(y j x, cr ^ ) can 
be approximated by a Gaussian G(y — y(x), <Tj,). More generally 
it takes a functional form determined purely by the measuring in- 
strument, in which the different components of y may or may not 
be independent. 

(iii) p{ay\x) is the probability of the quoted errors in y given 
the object's underlying characteristics x. 

(iv) p(x) is our prior. This will describe as many stellar popu- 
lations as we wish to take into account, in terms of an initial mass 
function (IMF) and a spatial distribution. It describes the relative 
abundance of stars of various types, without regard to their observ- 
ability. 

The uncertainties on the observed values of {I, b) are assumed 
to be sufficiently small for us to regard the corresponding pdfs as 
delta functions and drop the integrals over I and b, leaving a 4- 
dimensional integral in x. However, when this is done, any loca- 
tional factors in the prior must gain a multiplicative factor of in 
order to take account of the conical shape of the volume surveyed. 

In this work we assume a uniform distribution for p((Tj, | x) ; al- 
though in reality this will not be the case (for instance, more distant 
stars will tend to have higher errors), the dependence of (Ty on x 
will generally be sufficiently weak not to affect our results greatly. 
In this case, equation ^ simplifies to 

p{^\y,a-y,S) oc P{S\y,x,(Ty)p{y\x,ay)p{x). (7) 
This can be substituted into the simplified form of equation l[3]l. 



Iifc = I Xi p{K\y,cry,S)d^x, 



(8) 



Xi p(xjy,fTj;,S')d X, 



(3) 



giving a set of three integrals that together will give us an estimated 
parameter value for each star ({xi} — Tn/Tio) and an uncertainty 
thereon (ct; = {Xi2/Tio) — (xi)'^), taking into account all of the 
known information. Thus in a single pass we can infer all these 
values for each star's metallicity, age, mass and distance. 
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2.1 Selection function 

The nature of the selection function P(S|y, x, cry) bears comment, 
as it is by no means trivial. We must distinguish carefully what 
is meant by the symbol y in the formalism above: it refers to the 
actual observed parameters of the star, but only those we actually 
use in the analysis. There may be other parameters that have been 
observed and enter into the selection function. If this is the case, 
they can be expressed only as probabilities dependent on x, not on 
y. Therefore it becomes important to split the selection function 
into two parts, one dependent on y and the other on x: 



P{S\y, X, (Ty) = V(y, o"h) 



(9) 



In this product ip will generally be the dominant factor for two main 
reasons: 

(i) If the sample is based on an input catalogue, it will frequently 
be a selection on observables that are no t then reobserved - for 
example, the Geneva-Copenhagen survey ^Nordstrom et alj ; 



20041) 



was selected on the basis of the Stromgren photometry of lOlsenl 
( Il98 3. 1993, 1994a, b), which is simply transcribed to the Geneva- 
Copenhagen catalogue. 

(ii) The fundamental limitations on stellar photometry also tend 
to make i/j dominant: saturation leads to a bright-end cut being im- 
posed on the survey; at the faint end a cutoff is imposed to elimi- 
nate objects too faint for the detector's source extraction algorithm 
to distinguish true signals from noise. Both cuts are generally im- 
posed on the basis of the observed magnitude and therefore are 
functions of the observed properties of a star after scattering by 
observational errors. That is, they are functions of y rather than x. 

However, there can be contributions of the nature of (jyijsi), and these 
are of much greater importance to our analysis. For example, we 
will later be interested in comparison of the results provided by 
our method on a real stellar sample with those of the Hippar cos 
satellite. In such cases, it is common (e.g. iBreddels et al.l l20ld) to 
consider only those stars for which Hipparcos provides reasonably 
well-constrained distances, by restricting the sample to stars with 
fractional parallax errors below a certain threshold. It is important 
to recognize that this induces a bias in the sample: by considering 
only those stars with, say, cr^j/ro < 20% (where W represents the 
observed parallax and ct^ the error thereon), one will preferentially 
cut away stars with low parallaxes and hence large distances, leav- 
ing a sample biased towards small distances. 

This effect can, however, be at least mitigated by a consider- 
ation of the role of <j!>(x). For any value of x, and hence a 'true' 
model distance s, it is possible to define a pdf for a star's observed 
parallax W from a knowledge of cthj, assuming Gaussian errors: 
indeed 



p {■(^\s) — G [tz! — 1/ s, an 



(10) 



Therefore one can incorporate this bias into one's analysis in the 
form 



x) oc / G (ti7 — (Jro) dro, 



(11) 



providing an element of balance in the analysis that maximum- 
likelihood techniques neglect. 

If one can, as will generally be the case at least approximately, 
decompose the selection function into the form ^{y,ay)(j){-x), 
then since we are interested only in terms dependent on x, we can 
ignore any occurrence of V'(y) ""a) - Hence our final formula for the 
moments of a star's pdf collapses to 



lik = J Xi <;/>(x) p(y|x, cTy) p(x) d*x, (12) 
which can be readily calculated. 

2.2 Implementation 

Since for many surveys the errors on apparent magnitude are very 
small, the integration of equation l ll2t need only cover a small 
range in distance for each set of ([M/H],r, A^) values. Con- 
sequently, after experimenting with both fixed-grid and Markov 
Chain Monte Carlo techniques, it was decided that the best com- 
promise between speed and reliability was provided by a simple 
fixed-grid Newton-Cotes rectangle integration method, where one 
defines a grid of points in metallicity, the logarithm of age, and 
mass. For each point in this three-dimensional space one then inte- 
grates over a range in distance corresponding to an apparent mag- 
nitude spread of say ten times the magnitude error either side of 
the observed value - any distances beyond this range will give a 
negligible contribution to the integral due to the Gaussian factor in 
the likelihood termp(y|x, CTy). For the purposes of calculating the 
integral it was found to be sufficient simply to multiply the inte- 
grand's value at each grid point by an effective volume determined 
by the distance between the given grid point and its nearest neigh- 
bours. 

No two numbers completely characterize a general probabil- 
ity distribution. Medians, means and modes can all be used to 
characterize the centre of the distribution. The median is the most 
stable measure and generally to be preferred. Unfortunately, it is 
in general hard to calculate. T he mode, which was favoured by 
|j0rgensen & LindegrenI ( l2005b . is susceptible to noise when the 
distribution is at all flat-topped. Therefore we favour the mean 
as a robust and readily calculated measure of the pdf. Simi- 
larly, we favour the variance as an estimate of the width of the 
pdf, rather than the mo re complex confidence intervals used by 
|j0rgensen & LindegrenI 

Another advantage of using these two numbers to character- 
ize the distribution is that their calculation, via the integration tech- 
nique outlined above, avoids the thorny issue of interpolation on the 
isochrones by considering only the values of observables at tabu- 
lated grid points. Interpolation using isochrones is notoriously dif- 
ficult, and thus we avoid it as far as is possible. 

To summarize, the method we propose runs as follows: 

(i) Define a grid of points in ([M/H] , log t,A4) space. We found 
a suitable grid spacing to be given by the metallicity values given in 
Table [2] (although a greater range would do no harm - one should 
include the lowest metallicity that is available from the isochrones); 
a spacing in logr of 0.025; and the mass spacing provided auto- 
matically by the Padova isochrones, which ranges from 10"'' M0 
to 0.255 Mq. 

(ii) At each grid point, define a further set of points in distance 
s, covering the range 



s G [r(m — nom), rim + na„ 



(13) 



where m is the apparent magnitude used for fitting to observations, 
am. is the observational error thereon, and r(m) represents the dis- 
tance that produces an apparent magnitude of m given the metal- 
licity, age and mass at the grid point in question, n is some positive 
number chosen by the user - we found n = 5 to be ample for high 
precision. 

(iii) Assign to each point a 4-volume corresponding to a hyper- 
cuboid with boundary planes halfway to each of the point's nearest 
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neighbours (at the edges of parameter space, take the length in the 
limiting dimension to be that between the final two data points). 

(iv) Perform the integral in equation l ll2b by finding the value 
of (^(x)p(y|x, iTy) p(x) at each grid point, multiplied by the grid 
point's associated 4-volume. We term this product of probability 
density and volume the 'weight' of each grid point. For each of 
the first four dimensions of x, one can then define an array that 
keeps track of the sum of these weights multiplied by powers of 
each parameter Xi, and also keep a running total of the sum of the 
weights themselves, which we term Xq. 

(v) Once one has run over the entire grid (or some subset 
thereof, since one may be able to disregard metallicities too far 
from the observed value), one can then calculate the moments lik 
by dividing each running total by Xq. From the lik one then has 
directly an expectation and uncertainty for each dimension of x. 

(vi) It is of course imperative to check that one has sampled x- 
space sufficiently finely to achieve sufficient precision in the work. 
For these purposes one should redo the analysis of a sample of stars 
using a finer sampling in each dimension and check that there is 
minimal alteration in the outputs for the new sampling. If time is 
not a concern, one would ideally do this for every star, refining 
the grid iteratively to a point where there is minimal change in the 
outputs. This may require interpolation if the isochrones cannot be 
provided with a fine enough sampling. 

2.3 Terminology 

In what follows, we shall refer to the technique described above 
as the 'Bayesian method', in order to distinguish it from what we 
shall call the 'maximum-likelihood method'. The distinction here 
essentially concerns the nature of the prior and selection function. 
Maximum-likelihood techniques for fitting a model to data con- 
sist of finding the model (from some specified range) that maxi- 
mizes the probability of the data in question - i.e. in our case one 
would seek to maximize the likelihood p(yix, ay). This is mathe- 
matically equivalent to setting both the selection function and the 
prior in the above formalism to uniform distributions, but concep- 
tually it has an important difference from our Bayesian approach, 
which seeks to find the model with the maximum probability of 
being correct given the data. Maximum-likelihood techniques are 
popular due to their simplicity - they involve no consideration of 
the nature of the prior and they generally do not take into account 
the selection function. Furthermore, from a conceptual viewpoint, 
the Bayesian technique is a more justified approach to the problem 
of model selection than maximum likelihood: since the data are 
given, it is logical to seek the model with the maximum probability 
of being correct given the data. For the case of determining stellar 
ages, j0rgensen & Lindegren (2005) give a nice example of the ad- 
vantage of a Bayes ian approach over maximum likelihood, while 
IPont & Ever! | |2004|) provide an extensive discussion of the relative 
merits of a Bayesian technique. 

One more point bears comment with regard to maximum like- 
lihood (and naive Bayesian approaches) when the prior is set to be 
uniform in order to represent an unprejudiced starting assumption. 
For a uniform continuous pdf, the prior can only be defined to be 
uniform in a specific coordinate system: a transformation of those 
coordinates will in general not leave it so. Thus, for example, while 
a uniform prior in age may seem a reasonable starting point, it is 
difficult to justify such an assumption as opposed to a prior uni- 
form in e.g. the logarithm of age. Thus the assumption of a uniform 
prior in some space is nonetheless an assumption, and cannot be 
considered safer than making an explicit choice of prior after con- 



sidering all the circumstances of the particular case. In general one 
hopes for the likelihood function to be sufficiently strongly peaked 
at some value to render the exact form of the prior unimportant 
for the posterior distribution; however in such complex, degener- 
ate cases as those addressed here involving stellar evolution, such 
hopes are not always well-founded. For this reason a prior that is 
based on what we do know of stars in the Galaxy is an important 
factor in any calculation. 



3 TEST CASE 
3.1 Sample 

In order to test the consistency of the method, a fake data set was 
generated to mimic the sample observed by RAVE. For these pur- 
poses, the vector of observables was taken to be 

y = (log Teff , log g, [WHU,, J-K,J, I, I, b) , (14) 

where I, J and K denote apparent magnitudes. (Here, and through- 
out this paper, logarithms are taken to base 10.) Stars were gen- 
erated b y a Markov Chain following the Metropolis-Hastings al- 
gorithm jSahall2003h in x-space, with observational errors added 
in y-space at every step. Since the aim was to reproduce the 
joint pdf described by equation at every proposed point in x- 
space, a y value was generated by Gaussian scattering by a vec- 
tor of observational errors, and the probability P(5'|y, x, cry) was 
calculated; the acceptance was then based on the new value of 
P(Sly,x,o-,)p(x). 

The Markov chain was generated using a Gaussian proposal 
density, and ensuring an acceptance rate of order 30% by tuning 
the spread of the proposal density during a bum-in period of 10 000 
steps. The chain itself consisted of 2.5 x lO'' steps, of which forty- 
nine out of every fifty were then discarded (in order to minimize 
the chance of repeated x-points) to provide the output 50 000 stars. 
Convergence was verified by performing an identical run with the 
selection function switched off, and ensuring that the marginalized 
distributions in x corresponded to those input in the prior. 

The Gaussian observational errors were added using an error 
8-tuple 

ay = (0.0434, 7(yi), 1.07 yi - 3.71, 0.045, 0.023, 0.04, 0, 0) , (15) 
where 

ro.5 ifx<log(8 000), 

' \ 0.25 + 0.436 (2:-log(8 000)) otherwise; ^ ^ 

designed to be representative of the scale of observational errors 
in RAVE. Errors on logarithmic quantities are measured in dex. 
(Although the true errors on log Tcft and other derived observables 
may not actually be Gaussian, it was considered to be a sufficient 
approximation for this proof-of-concept. A different error distribu- 
tion could be employed very easily.) The pseudodata were therefore 
generated according to a distribution function described by 

/(x,y) =p{x,y,ay\S) oc P{S\y,x, ay) p{y\x, ay) p{x), (17) 

where the factors on the right are as follows: 

For the prior we took a three-component Milky Way model of 
the form 

3 

p(x) =p(M)^p,([M/H])p,(r)p,(r), (18) 

1=1 
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Table 1. Values of disc parameters used. 



Parameter 


Value (pc) 


othin 


2 600 


^thin 

^d 


300 


pthick 

d 


3 600 


^thick 

^d 


900 



where i = 1, 2, 3 correspond to a thin disc, thick disc and stellar 
halo, respectively. We assumed an identical Kroupa-type IMF for 
all three components and distinguish them as follows: 



Thin disc (i = 1): 

Pi([M/H]) = G([M/H],0.2), 

pi(r) oc exp(0.119r/Gyr) forrsClOGyr, 

R \z\ 



(19) 



pi(r) oc exp 



^thin 



^d 



Thick disc (i = 2): 

P2([M/H]) = G([M/H] + 0.6,0.5), 

P2(t) oc uniform in range 8 ^ r ^ 12 Gyr, 

R \z\ 



(20) 



P2(r) oc exp 



^thick 



G([M/H] + 1.6,0.5), 



uniform in range 10 ^ r ^ 13.7 Gyr, 



(21) 



Halo (i = 3): 

P3([M/H]) = 

p-i{r) oc 
P3(r) oc 

where R signifies Galactocentric cylindrical radius, z cylindrical 
height and r spherical radius. The parameter values were taken as 
in Tabled] th e values are taken from the analysis of SDSS data 
in ljuric et al.l ( l2008i). The metallicity an d age distributions for th e 
thin disc come from iHavwood ( l200lh and|A umer &Binnevll2009l) . 
while the r adial density of the halo comes from the 'iimer halo' 
detected in lCarollo et al.l (2009). The metallicit y and age dist ribu- 
tions of the thick disc and halo are influenced bv lReddvl HoO^ and 
ICaroUo et al] ^2009) . 

The normalizations were t hen adjusted so that a t the solar po- 
sition, taken as Rq = 8.33 kpc jGillessen et al.ll2009|), zn = 15 pc . 
we have number density ratios n2 / ni = 0.15 jCarollo et a lj2009l) . 
ns/ni = 0.005 jjuric et al.ll2008l) . 

The IMF c hosen follows the form originally proposed 
by iKroupa etd] {1993), with a minor modification following 
lAumer & Binnevl ( l2009D . being 



-2.2 
-2.519 



if < 0.5 Mq, 
if O.5M0 ^ M < 
otherwise. 



IMr. 



(22) 



p{M) oc 0.536 Al" 
I 0.536X" 

We deter mined y as a functio n of x from the isochrones of the 
Padova group jBertelli et al.ll2008l) , which provide tabulated values 
for the observables of stars with metallicities ranging upwards from 
around [M/H] ^ —0.92, ages in the range r e [0.01, 19] Gyr and 
masses in the range M G [0.15, 20] Mq. We used isochrones for 
16 metallicities as shown in Table|2l selecting the helium mass frac- 
tion Y as a function of metal mass fra ction Z according to the rela- 
tion used in lAumer & BinnevI ^20091) . i.e. Y ^ 0.225 + 2.1Z and 



Table 2. Metallicities 
(0.017,0.260). 



of isochrones used, taking {Zq,Yq) 



z 


Y 


[M/H] 




\j.Zj\j 


n OT /I 


U.UUj 


n 01 1 

U.Z.J I 


— U. ( io 


0.004 


0.233 


-0.652 


0.006 


0.238 


-0.472 


0.008 


0.242 


-0.343 


0.010 


0.246 


-0.243 


0.012 


0.250 


-0.160 


0.014 


0.254 


-0.090 


0.017 


0.260 


0.000 


0.020 


0.267 


0.077 


0.026 


0.280 


0.202 


0.036 


0.301 


0.363 


0.040 


0.309 


0.417 


0.045 


0.320 


0.479 


0.050 


0.330 


0.535 


0.070 


0.372 


0.727 



assuming solar values of {Yq, Zq) = (0.260, 0.017). The metal- 
licity values were selected by eye to ensure that there was not a 
great change in the stellar observables between adjacent isochrone 
sets. 

The selection function P(Sjy, x, ay) was chosen to describe 
rave's selection criteria. RAVE observes stars with nominal DE- 
NIS J-band magnitu des in the range 9 < /denis < 12; however 
IZwitter et af] ( |2008[) explain that the upper limit actually extends 
up to one magnitude fainter, and that there is evidence of satura- 
tion around /denis < 10. For these reasons it was decided to 
take the full range of /-band magnitudes observable by RAVE to 
be 4 < / < 13, and to disallow stars falling outside this range. 
Although the brighter limit may seem overly permissive, its ac- 
tual value has little importance due to the other major factor in 
P{S\y,:K,(Ty): a completeness term. Although RAVE is theoret- 
ically capable of observing all stars within its magnitude limits, it 
is not a complete survey and thus stars at certain magnitudes have 
a higher probability of being included in the catalogue than others. 
For the purposes of this test , it was decided to use an approximation 
of fig. 4 of Stei nmetz et al.l (2006), of the form 

P(S'|y, X, (Ty) oc 2.9 G{I - 9.8, 0.76) + G{I - 11.7, 0.51). (23) 

While this neglects variations in completeness with sky position, it 
seems a reasonable approximation for our pseudodata. The func- 
tional form results in stars with particularly bright apparent magni- 
tudes being given very low weight irrespective of our chosen value 
for RAVE'S low-/ cutoff. 

The other factor included was a cutoff at Galactic latitudes of 
|6| ^ 25°, since RAVE avoids regions close to the Galactic plane. 
This has the obvious effect of biasing the sample slightly towards 
thick-disc and halo stars. (It also prevents any density divergence 
near the Galactic centre due to the halo density profile.) 



3.2 Results 

50000 stars were generated according to the above model. Inter- 
polation in the isochrones was necessary for these purposes; since 
the chosen sampling of the isochrones was reasonably dense it was 
decided to use linear interpolation rather than a more complex and 
arbitrary method. The density of the sampling points should en- 
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Figure 1. Distribution in metallicity (upper plot) and J-magnitude (lower 
plot) for the pseudodata (full lines) and stars in the RAVE catalogue (dashed 
lines). 



sure that any errors introduced by this interpolation technique are 
minimal, and certainly sufficiently small for the present testing pur- 
pose. The resultant distributions in metallicity and in J magnitude 
are displayed in Fi g.[T] along with that of the second data release of 
the RAVE survey jz witter et al.ll2008h . showing that the sample is 
a reasonable mimic. This sample was then analysed using the tech- 
nique expounded in Section [2] The y-space fitting (the likelihood 
term p(y|x, cry)) was performed in the first five components of y. 
It was decided against using the J-band for analysis due to the sat- 
uration concerns described above, and the consequent fear that the 
/-magnitudes of model stars will not correspond to the Jdenis sys- 
tem. The same reliability fear militates against considering /(x) in 
the factor </>(x). The factor p(x) was initially given the same form 
for the analysis as for the sample generation, while the factor (^(x) 
was taken to be flat, since the selection function was entirely a func- 
tion of y. 

In order to perform the integration of equation M2\ . we fol- 
lowed the prescription of Section (2] After trying various subdivi- 
sions of x-space, it was determined that the optimal subdivision, 
balancing speed against accuracy, was obtained by taking the grid- 
points of the integration at the metallicity values found in Table[2l 
increments of 0.025 in log(r/Gyr), and the non-uniform mass 
spacing provided by the isochrones. Integration in these three di- 
mensions was performed over the entire isochrone range. The dis- 
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Figure 2. Density plot of the relationship of calculated to true distance for 
stars out to 500 pc. Contours measure density in points pc~^. 



tance integration was performed using n = 5 in equation J13t for 
the apparent J-magnitude, with 5 distance points used for each 
point in ([M/H], log t,M). Using more points than this was found 
to give negligible improvement in results, since the actual spread 



; tiny 
et £d] 



for each ([M/H], log t,M) value is extremely small due to the tirr 
value s of oj in the 2MASS survey (~ 0.023 mag, Skrui 
|2006|) . and correspondingly in the RAVE sample. 

The results of the analysis are displayed in Figs.|2]and[3] Fig.|3] 
displays a histogram of the difference between the calculated value 
of each star's distance and its true value, divided by the distance 
uncertainty as returned by the method. Although it can be clearly 
seen that the distribution is skewed (indeed we have no a priori 
reason to expect it not to be), it is not biased: the distribution dis- 
played has a mean value of 0.009 and a dispersion of 0.93, giving a 
reasonable sign that individual error estimates are trustworthy. For 
comparison, a fit using a simple maximum-likelihood method (i.e. 
dropping the factor J3(x) from the analysis) is displayed in Fig.|4] 
The mean of the distribution shown in the top panel of Fig. |4] is 
0.15, with a dispersion of 0.68; this bias confirms the criticism of 
flat priors in Section [23] It is noteworthy that this bias persists de- 
spite the fact that the output uncertainties are significantly larger 
in the maximum-likelihood case, as demonstrated by the bottom 
panel of Fig. |4] (indeed, the small dispersion of the top panel of 
this figure implies that these uncertainties are systematically over- 
estimated). Furthermore, the positive wing displayed in the middle 
panel of Figs.[3]and|4]is notably more pronounced in the maximum- 
likelihood case. Hence it can be seen that the method outstrips stan- 
dard photometric distance determination techniques. 

In the case of an analysis of real stellar survey data, the prior 
will not be known to perfect precision. Consequently, we have also 
performed the analysis with two different incorrect priors, each 
consisting of a single stellar population: 



Approximate prior 1: 

p([M/H]) = G([M/H] 4-0.12,0.2), 

p(r) oc exp(0.119r/Gyr) forrsJlOGyr, (24) 

p{M) oc 

R \z\ 



p(r) 



exp 



2 000 pc 400 pc 
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Figure 3. Results of analysis using the Bayesian method. Top panel: dis- 
tribution of normalized residuals. While the distribution is skewed, it is al- 
most entirely unbiased. Middle panel: distribution of residuals as a fraction 
of true distance. Bottom panel: distribution of fractional uncertainties. 



Approximate prior 2: 



p([M/H]) 

p(r) 
p{M) 

p(r) 



G([M/H],0.3), 
exp(0.13r/Gyr) 

R 



for r ^ 10 Gyr, 



(25) 



exp 



2 000 pc 400 pc 



Fig- H] shows the results of the analysis of our pseudodata us- 
ing each of these priors. The results are remarkably good, aligning 
extremely closely with those using the correct prior. This is very en- 
couraging, as it implies that the use of an approximate prior in the 
analysis of a real sample will give very reliable results. Most impor- 
tantly, the results in Fig.[5]are incontrovertibly better than those of 
Fig.m signalling that the use of an approximate prior in any photo- 
metric distance determination must be preferred to the use of a flat 
prior - maximum-likelihood techniques are sub-optimal in such a 
complex situation. 

Fig.|6]displays the uncertainty histogram produced from anal- 
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Fractional uncertainty aj[s) 
Figure 4. As Fig.|3] but for a maximum-Ukelihood method. 



ysis of a sample generated in exactly the same way as before, ex- 
cept with the observational error on log g halved to aiog g (yi ) = 
0.5 7(yi) dex. When analysed with corresponding uncertainty in- 
cluded, there is a dramatic improvement in the accuracy obtained 
(the curve labelled 'half error'). This is not surprising, since sur- 
face gravity is the key discriminant in differentiating between gi- 
ants and dwarfs, and since the difference in brightness between the 
two species is so vast, any improvement in our ability to discrimi- 
nate between them will rapidly decrease the uncertainty of our dis- 
tance estimate. Quantitatively, halving the error in logg reduces 
the uncertainty in the distance by a factor of order 23%. Hence it 
is of the utmost importance to beat down observational errors in 
measurements of log g whenever photometric distances are to be 
obtained. 

The negative wing of the distribution shown in the top panel 
of Fig.[3]is largely composed of stars that are best modelled as old 
stars (r ^ 6 Gyr) of around one solar mass. The underestimation 
of the uncertainties on such stars stems from the high weight that 
the prior assigns to such stars; if the data can be matched by such 
a star, the probability of this match is high and the uncertainty in 
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Figure 5. As Fig. |3] but using two flawed priors as described in the text. 
Blue line, full: correct prior; Black line, dashed: approximate prior 1; Red 
line, dotted: approximate prior 2. 

the distance is dominated by thie small error in the star's apparent 
magnitude. 



3.3 M etallicities, ages and masses 



Figs.lSHTOlshow the performance of our method in the recovery of 
metallicities, ages and masses. Fig. [8] shows that there is minimal 
bias in all three measurements, and that the estimated uncertainties 
are reliable. Furthermore, Fig. [TO] shows that with estimates of the 
assumed precision (0.043 dex in T^s, ~ 0.5 dex in log gr, ~ 0.3 dex 
in [M/H], 0.045 mag in J - if and 0.023 mag in J) stellar pa- 
rameters can be determined with good precision. Remarkably, the 
output uncertainties in [M/H] are strongly peaked at ~ 0.18 dex, 
significantly smaller than the 'observational' errors (~ 0.32 dex, 
see Fig. |7). This reduction in uncertainty is made possible by si- 
multaneously using all available information, which includes the 
physics of stellar evolution and the morphology of the Galaxy; it 
is not in any sense a 'creation of information'. The errors given in 
the RAVE catalogue are conservati vely large to allow for imper- 
fections in the calibration data sets jz witter et al.ll200§) . They are. 
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Figure 6. Normalized distribution of quoted fractional uncertainties from 
analysis of both the original pseudodata, with full cri^gg ('original') and 
data with half of this gravity error ( 'half error'). 
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Figure 7. Histogram of the quoted 'observational' metallicity errors in our 
pseudodata. 



moreover, based on the analysis of individual stellar spectra, with- 
out regard to the known properties of the population from which 
the individual star is drawn. It is to be expected that the injection 
of prior information about the Galaxy's stellar populations and the 
information carried by the photometry diminishes the uncertainty 
on the metallicity of each star. 

Also included in Fig. [8] are the results obtained with the poor 
prior. It can be seen that only the metallicity results are particu- 
larly altered, which is extremely promising: the bad input prior (a 
Gaussian centred on the wrong value) could have been dismissed 
a priori by comparison with the observed metallicity distribution 
of the sample, and thus metallicity space is in fact the least at risk 
from such effects. Consequently the biasing seen in the first panel 
of Fig.[8]is unrealistically pronounced. 

One final note; while extinction and reddening have not been 
included in this work, any model of these effects could be included 
in the analysis simply by altering the dependence of colours and 
magnitude on position. 
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Figure 8. Distributions of residuals in [M/H], log r and A4 normalized by 
the corresponding uncertainties for the pseudodata using the correct piior 
and using the approximate priors. Blue line, full: coirect prior; Black line, 
dashed: approximate prior I; Red line, dotted: approximate prior 2. The 
mean is displayed for each distribution. 



4 REAL- WORLD TESTING: THE 
GENEVA-COPENHAGEN SAMPLE 

The previous section demonstrated the success of the method ap- 
plied to a well-controlled sample of pseudodata. Here we inspect 
its performance when applied to real-world data, with all of its as- 
sociated noise and limitations. T o this end we have an alysed the 
Geneva-Copenhagen survey data ( iHolmberg et"ai]|2009l) . selecting 
all stars with Hipparcos parallaxes. 

Since error propagation from parallax to distance space is not 
trivial for stars with sizeable fractional parallax errors, it is best to 
perform the comparison in parallax space itself. Our method can 
provide an estimated value and uncertainty for each star's paral- 
lax as easily as its distance and with equal validity, so this com- 
parison is a simple matter. This also permits comparison with a 
larger sample of stars than would be permitted in distance space, 
since even stars with negative Hipparcos parallaxes can be used in 
a parallax-space comparison. Consequently the subtle biasing in- 




{{M}-Mt„,)/Mt, 



Figure 9. Distributions of residuals (as a fraction of true values in the cases 
of metallicity and mass) in [M/H], log t and M, for the connect prior. 



troduced by clipping the sample at some parallax or parallax error 
can be avoided. 

In Section [3] we used rather a basic characterization of the 
metallicity distribution of the thin disc. The Geneva-Copenhagen 
survey (hereafter GCS) is dominated by the thin disc, so it is worth- 
while to use the best available model of the thin disc's metallicity 
distribution. The analysis of the GCS by'Xumer & BinnevI ^00% 
found the underlying metallicity distribution to be well fitted by a 
pdf of the form 



Pi([Fe/H]) = G([Fe/H] + 0.12, 0.17), 



(26) 



and we used this to represent the metallicity distribution of the thin 
disc in equation l |19l l, approximating [M/H] ~ [Fe/H], 

In order to be able to use the Padova isochrones for our analy- 
sis of the GCS data, we required magnitudes in the Johnson system 
for the stars. Since T ycho magnitudes are available for the vast ma- 
jority of th e sample I Perrvmarjl 19971) , we used the transformation 
relations o f lBesselll l [2OO0l) to convert these to Johnson magnitudes, 
assuming a conservative spread of 0.02 mag about the transforma- 
tion lines. 
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Figure 10. Distributions of uncertainties in [M/H], logr and JV\ for the 
con'ect prior. Uncertainties in metallicity and log age are absolute; for mass 
both an absolute error (measured in solar masses) and a fractional eiTor 
distribution are displayed. 
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Figure 11. Distribution in effective temperature, metallicity and V- 
magnitude for our GCS pseudodata (full lines) and star in the real GCS 
catalogue (dashed lines). 



We then generated a pseudodata set to mimic the GCS. For 
this set, we performed the same procedure as in Section[3] taking 



y = (logrcft,l^,B-K[M/H]„ 



(27) 



with the selection function defined by the intersection of the fol- 
lowing conditions: 



P(Siy,x,a,) = { 
P(S|y,x,a.,) = { 



I logTeff-3.6 

P{S\y,^,CTy) = 0.24 

P(S|y,x,a,) = G(\/'-50,13). 



1 if 0.28 =^ F - J < 1.72, 

otherwise; 

1 if log 3^3, 
otherwise; 



(28) 
(29) 



if 3.6 < logTeff < 3.84, ^30^ 
otherwise; 

(31) 



Condition ([28} is appropriate because the GCS was designed to 
limit the survey to F and G stars; the colour cut we have im- 

S iosed is conservative, perm itting stellar types in the range A5-K0 
Binnev & Merrifielj[l998h . Our log g cut crudely reflects the se- 



lection against giants that was made on the basis of Stromgren pho- 
tometry; the other factors are simply based on visual inspection to 
provide a reasonable mimic of the actual distributions of the data 
in observable space. We also imposed a Gaussian error distribution 
on the 'measured' parallax with dispersion 



0.3 mas 

(0.3 + Qm{V 



-5)= 



if V < 5, 
otherwise, 



(32) 



based on an approximate fit to the variation of Hipparcos' parallax 
error with apparent ^-magnitude. The resultant match is displayed 
in Fig.im 

For the analysis of the GCS data it was decided to take 



(33) 



if 0.28 4:V - J < 1.72, 
. otherwise, 

and the prior p(x) from equation ( |18t ff. Including a log g depen- 
dency within <^(x) would have been possible, but it was felt to be 
too haphazard to simply cut away all stars with e.g. log g < 3, since 
the GCS selection was made on the basis of Stromgren colours and 
not on direct measurements of surface gravity. Consequently we 
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analysed both the pseudodata and the real GCS data using a selec- 
tion function as in equation ( I33t . The fitting was performed in the 
variables (log Toff, V,B — V), since we were not confident in the 
precision of matching [M/H] to [Fe/H], 

In Fig.[T2]we display the results of analysis of both the pseudo- 
data and the true GCS sample; both samples contained 14 233 stars. 
It can be seen that the match is very good. In the first and fourth 
panels we display the mean and standard deviation of each distri- 
bution. To find these values for our analysis of the true data we have 
clipped 62 outliers with a:-axis values below —6. The total uncer- 
tainty in the first panel, a^*''', is found from adding our output un- 
certainty and the Hipparcos error in quadrature. The uncertainties 
in the third and fourth panels are those purely due to our method; 
the fourth panel displays the normalized residuals for the pseudo- 
data analysis when one does not include any observational error 
in the 'observed' parallax measurements. It is clear from Fig. [12] 
that our method works convincingly on real data. Fig.[T3]shows the 
output from applying a maximum likelihood analysis to the same 
data, and it can be seen that the results are comparable to those 
from section[3l the mean of the normalized residuals distribution is 
significantly biased, falling at a value of 0.50. 



4.1 Binary stars 

The method developed here is properly applicable only to single 
stars; however binaries with a mass ratio reasonably far from unity 
will not present a significant problem since the observations can be 
expected to capture the properties only of the more massive star. 
It is instructive to consider the worst case - that of two equally 
massive stars - in which the absolute magnitude will be 0.75 mag 
brighter than for one of the stars. If all other observables are as they 
would be for one of the stars, this can be expected to result in an 
underestimate of the distance by around 29 per cent. Fig.|3]demon- 
strates that this is within, or on the order of, the output uncertainties 
for virtually all stars due to the underlying physics, and thus does 
not dramatically compromise our results. 

The GCS catalogue makes an empirical test of this theory pos- 
sible. Fig.ll4lshows the histogram of normalized parallax residuals 
for our analysis of the GCS data, divided into those that have been 
flagged in the GCS catalogue as a probable binary, and those with- 
out this flag. Each group is labelled with its mean and dispersion 
after cutting outliers at a; < —6. The photometric parallaxes of the 
binaries are larger than their measured parallaxes by 0.17a in the 
mean, while the parallaxes of the single stars are O.lla smaller in 
mean than their measured parallaxes. Since cr/u7 ~ 0.36 (Fig.ll2t. 
this level of bias corresponds to ~ 6 per cent for each star, which 
lies well within the predicted worst-case level. Moreover, since the 
probability that a given system will be flagged as a binary is an in- 
creasing function of received flux, and therefore of true parallax, 
flagged stars may be expected to have Hipparcos parallaxes that 
are on average slightly smaller than their true parallaxes. This can 
be seen by considering an imaginary sample of stars with a single 
measured Hipparcos parallax; these stars will have been scattered 
by observational errors from true parallaxes both above and below 
the measured value. If one then selects, independently, stars with a 
preferentially larger true parallax, one will obtain a sample which is 
fundamentally biased to true parallaxes greater than the Hipparcos 
measurements. Expanding this reasoning to the entire Hipparcos 
sample, it must hold for every measured parallax, and thus one ex- 
pects the subsample flagged as binaries to be biased in exactly this 
manner. Consequently the amount by which our photometric par- 
allaxes exceed the true parallaxes is likely to be smaller than the 
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Figure 12. Photometric parallaxes from GCS pseudodata (blue lines, full) 
and from the real GCS sample (black lines, dashed). Top panel: the normal- 
ized distribution of parallax residuals; OTots is obtained by adding the errors 
returned by the method and the Hipparcos errors in quadrature. Each line 
is labelled by (mean, dispersion). Second panel: distribution of residuals 
as a fraction of true parallax. Third panel: distribution of fractional photo- 
metric (output) uncertainties. Bottom panel: equivalent to the top panel but 
showing true normalized residuals for the pseudodata, removing the 'obser- 
vational' error. 
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Figure 13. Normalized parallax residuals for the real GCS sample for a Figure 15. The distribution of stars in fractional error space, from analysis 

maximum UkeUhood analysis. of the GCS data. Fit curves are described in Section|4] 




Figure 14. Histogram of the normalized residuals in parallax space for stars 
thought to be single (blue line, full) and those flagged as binaries (black line, 
dashed). For each distribution, the mean and dispersion are displayed in the 
legend. 



amount by which they exceed the Hipparcos parallaxes. In fact this 
effect may also account for the discrepancy between the photomet- 
ric and Hipparcos parallaxes of 'single' stars: many of these objects 
will be binary stars that are too distant to be resolved, and it is likely 
that in more than half of these cases accidental errors have boosted 
rather than diminished the Hipparcos parallax. 



4.2 Parallax vs distance 

It is of interest to inspect the results of our analysis of the GCS 
in distance as well as in parallax. Fig. [15] shows a plot of the dis- 
tribution of output fractional uncertainties in distance against frac- 
tional uncertainties in parallax. It follows a rather peculiar 'high- 
heeled shoe' shape. We can explain the general form of this dis- 
tribution by considering two highly simplified forms for a star's 
distance pdf. Inspection of the pdf for stars in the 'heel' of the shoe 
(cTit,/ (ro) ~ 0.3) shows the typical pdf to be reasonably approxi- 
mated by a single power-law with a low-s cutoff, i.e. 



P(s) oc 



for s < a, 
otherwise. 



(34) 



If we calculate the moments of this highly simplified pdf we find 
that, letting X = o^/ {zu) and Y = Os/ {s), we should expect the 
relation 



(35) 



VT+x^ + VT+i^ = 2, 

which is overplotted on Fig. [15] and can indeed be seen to fit the 
heel of the shoe. 

Regarding stars on the 'sole' of the shoe, inspection of their 
pdfs in distance reveals them to be largely bimodal. As a very crude 
approximation of this we can take 



p{s) = P5{s- si) + (l- P)5{s~ r?si) , 



(36) 



where 5{x) represents the Dirac delta function. The resultant mo- 
ments are not trivial, but if we consider only the case in which 
77 3> 1, we obtain 



1-/? 



1-/3 



Y '. 



1 



(37) 



which is also overplotted on Fig.[T5]and provides a good fit to the 
sole of the shoe, despite the crudeness of the approximation. 

The fact that this extremely simplistic widely-spaced bimodal 
distribution provides such a good fit to Fig. [TS] conceals an impor- 
tant fact. It is related to the lack of log g information in the GCS 
data in the form that we have used them: the distance-fitting al- 
gorithm has essentially no way of distinguishing between dwarfs 
and giants for many of the stars, resulting in two widely-spaced 
peaks in the distance pdf. The large uncertainties on the distance 
reflect this lack of discrimination, providing an even more cogent 
argument than that of Fig.|6]for the necessity of tight observational 
constraints on log g if accurate spectrophotometric distances are to 
be obtained. In this case it is clear that the best choice for an in- 
dication of each star's position is actually given by its measured 
parallax and the uncertainty thereon, as opposed to the direct dis- 
tance determination provided by the method. 

Since the GCS stars virtually all have Stromgren photome- 
try, it is of interest to study the results from running our method 
when these data are included. To do this we need a set of stellar 
models that provide Stromgren magnitudes. The BaSTI isochrones 
jCassisi et al.l200^ provide such models. We used the [Fe/H] val- 
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Figure 16. The distribution of normalized residuals from an analysis of the 
GCS data using the BaSTI stellar models. 

ues in the set 

[Fe/H] G {-2.27,-1.79,-1.49,-1.27,-0.96, 
-0.66, -0.35, -0.25, 0.06, 0.26, 0.40}. 

Unfortunately, it proved impossible to obtain a convincing match 
between these models and the data: for many GCS stars (~ 11 000 
of the 14 233) no BaSTI model lay within ~ 2a of its measured 
properties - the errors on the GCS data are ~ 0.01 in log Teg 
jNordstrom et al.ll2004h and ~ 0.003 in 6 - ?/ jOlsenll 198311 19931. 
Il994al lbh. It is also clear from an inspection of the data that a simple 
systematic shift in effective temperature cannot solve this problem. 
In an attempt to work around this difficulty, we dropped b — y from 
y, making the observational constraints 

y = (logTeff, V, mi,c^, [Fe/H]obs) , (38) 

and we enlarged the errors in photometry to ten times their quoted 
medians; hence 

(Ty = (0.01, 0.05, 0.04, 0.06, 0.1). (39) 

We maintained the same prior as for our previous analysis, and took 
a flat 0(x). The resultant histogram of normalized parallax residu- 
als is displayed in Fig. [161 along with the mean and dispersion of 
the distribution. It can be seen that although the distribution is, as 
ever, unbiased, the errors in this case have been noticeably under- 
quoted, resulting in a dispersion of 1.7 in the distribution of nor- 
malized residuals. This is hardly surprising given the questionable 
nature of the fit between the models and the observations. Fig. 1171 
shows that the distribution of the stars in error space is similar to 
that of Fig.|15l however, since the top of the shoe occurs at ~ 1.4 
rather than ~ 2.5, the distance errors are now smaller, reflecting the 
fact that the Stromgren colours give a significantly better handle on 
the dwarf/giant dichotomy than standard Johnson colours. 



5 RELATION TO PREVIOUS WORK 

It is worth briefly considering the difference between the technique 
present ed here and that emp loyed on the second RAVE data re- 
lease bv lBreddels etalJbOld) . Breddels et al. estimate the distance 
to each star by finding the model star that gives the smallest chi- 
squared value when its observables are compared with the obser- 
vations. Then 5 000 'observed' realizations of the model star are 
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Figure 17. The distribution of stars in fractional error space, from analysis 
of the GCS data using the BaSTI stellar models and Stromgren photometry. 



produced by scattering the observables of the best-fitting star with 
errors drawn from a Gaussian in y-space, of width determined by 
the stated observational errors. Finally the model that best fits each 
of these 5 000 pseudo-stars is found, and the the mean and vari- 
ance of the absolute magnitudes of the 5 001 best-fitting models is 
calculated. 

This technique can be viewed as a form of maximum- 
likelihood estimation: it essentially involves considering only the 
likelihood term p(y|x, (Ty) from our equation Q. The main short- 
coming of the procedure is that it fails to take advantage of the fact 
that some types of star are extremely rare, so it is much more likely 
that a particular datum reflects bad luck with the observations than 
detection of a very rare type of star, such as a young, very metal- 
poor star, whose true observable coincides with the datum. Incor- 
poration of the selection function and the prior allows the computer 
to choose the model star that is the sanest choice. 

There is an additional problem with the procedure used by 
'Breddels et al.': in the event that the data lie say 2a from the true 
values of the star's observables, the method involves fitting model 
stars to data points that lie 3 or even 4(j from the true value. Com- 
pou nding errors in this w ay is not sensible. 

IZ witter et al.l ilOld) have recen tly made significant improve- 
ments on the work of'Bre ddels et al.L introducing a mass prior and 
a more sophisticated treatment of the RAVE errors, as well as ex- 
plor ing the effects of differen t sets o f stellar models. 

J0rgensen & Lin degrenI ( l2005h . building on the work of 

IPont & Ever (2004) (see appendix |AJ, developed a Bayesian 
method for the estimation of stellar ages (which extends naturally 
to other parameters), involving a very simple prior and marginal- 
ization over all parameters except the one of interest. The concept 
of regarding distance as a stellar parameter as we do in this paper 
was iiot explored; 

ij0rgensen & Lindegren ' use the mode to characterize each dis- 
tribution, a decision with which we disagree for the reasons given 
in Section |2] Confidence intervals, which they used to character- 
ize errors, are good measures but in general comparatively diffi- 
cult to calculate, and more importantly require interpolation in the 
isochrones, which we have endeavou red to avoid. 

The prior used in the analysis of lJ0rgensen & LindegrenI is ex- 
tremely simple: it is flat in both metallicity and age, although in 
initial mass it is a single power law, representative of a simple IMF. 
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While the rationale for this choice is well-established, we feel that 
it does not take advantage of the large body of prior knowledge 
that we have regarding the samples with which we are dealing: the 
Galaxy is certainly not, for example, uniformly distributed in age. It 
seems logical to exploit all the information that is available, and that 
includes our significant previous knowledge of the distributions of 
each stellar parameter throughout the Galaxy. 

An important point releva nt both to the work of 
J0rgensen & Lindegren, and that of iBreddels et alj regards speed: 
J0rgensen & Lindegrenl 's techniq ue of analysing the ma rginalized 
pdf for the stellar age (or mass in iNordstrom etai]|2004l) does not 
permit the versatility of calculation that our method provides: by 
considering the distance as an intrinsic parameter of each star, on 
the same footing as its metallicity, age and mass, we are able to 
provide a consistent and simultaneous (and therefore rapid) deter- 
mination of the values of all four parameters for each star, without 
requiring successive marginalizations t hat would signifi cantly slow 
the analysis. Likewise the technique of IBreddels et alj requires the 
solution of 5 001 optimization problems for every star, as opposed 
to a single-pass integration. Our technique therefore represents 
a significant improvement in efficiency, which is particularly 
relevant in the analysis of the large surveys that are both under way 
and planned. 



6 CONCLUSIONS 

We have presented a Bayesian technique for determining stellar 
parameters from photometric and spectroscopic data. It has been 
demonstrated that the technique outperforms maximum-likelihood 
techniques, and the mathematical and physical basis of the system 
ensures that all available information can be exploited in the cal- 
culation. The resultant uncertainties, assuming a reasonable prior 
is chosen, are therefore a consequence purely of the underlying 
physics, and by virtue of this the technique is optimal - given a set 
of data and some level of understanding of the underlying physics, 
one cannot do better Since the uncertainties derived from our tech- 
nique simply reflect the physics, so long as one employs a reason- 
able prior, the technique will provide the most accurate possible 
estimates of the true uncertainties - smaller estimated uncertainties 
could only come from an overly restrictive prior, undersampling 
the pdf in some manner o r defining the uncertaint i es in a different 
manner, as for example in |j0rgensen & LindegrenI ( l2005h . 

The problem of fitting stellar models to observations is an 
ideal setting for the application of Bayesian probability theory. We 
have a wealth of advance knowledge of the underlying physics with 
which to construct a prior, and the complex relationship between in- 
put parameters and observables for stellar models renders a simple 
maximum-likelihood approach suboptimal for providing reliable 
parameter determinations. There are favoured regions of observ- 
able space; our approach exploits this by virtue of the input prior. 
The Bayesian method also enables a valuable synthesis of different 
areas of astrophysical research: large-scale analysis and mapping 
of stellar populations in the Galaxy feeds us information for the 
prior, whilst the specifications of observations give us the form of 
the likelihood and selection functions. 

As with all photometric techniques, obscuration, which we 
have neglected, is an important issue. It would be simple to include 
reddening and extinction models, but inevitably the results would 
then be vulnerable to weaknesses in the adopted models. When a 
large body of trigonometric parallaxes for relatively distant stars is 
at hand, it will be possible to adjust such models by making pho- 



tometric distances compatible with trigonometric parallaxes. Such 
work will be a key project to be undertaken with the Gaia Cata- 
logue. 

A simplification we have made is to consider all metallicity 
information to be encoded in the value of [M/H] for each star. The 
chemistry of stars is in reality more complex and both helium abun- 
dance and alpha enhancement play significant roles in stellar evo- 
lution. We have not explicitly addressed this issue because to do 
so we would require both further sets of isochrones and additional 
spectroscopic observables. However, such refinements will be pos- 
sible within a decade; most importantly, their incorporation would 
represent no fundamental change to the method presented here. 

While our method is strictly only applicable to single stars, 
dividing the GCS sample into probable binaries and single stars re- 
veals that the parallaxes of binaries are under-estimated by at most 
~ 0.2a in the mean, and probably by only half this figure (Sec- 
tion |4T|(. The effect of binarity on the estimation of other stellar 
parameters is more subtle, however I J0rgensen & LindegrenI jlOOSi) 
provide a demonstration that its effect on Bayesian age determi- 
nation is quite limited. The effect of binaries could be incorpo- 
rated into our formalism by including a probability of binarity in 
the prior, along with its associated effect on the various observ - 
ables; this is the approach taken in the work o f |Pont&Eveil ( l2004h . 
Taking full account of such contamination would require observa- 
tional simulations to find the effect of different mass-ratio binaries 
on each observable. 

In this work we have introduced the selection function 
P(S|y,x, (Ty) as an explicit term in our calculations. Its inclu- 
sion provides an intuitive and logically consistent means of taking 
account of all selection effects introduced by instrumental appara- 
tus, observing strategies and later choices (such as cuts on errors 
to 'clean' a sample). The presence of this factor in our calculations 
in the form of </>(x) is key, as it truly brings the model to the data 
rather than vice versa, and thus permits a theoretically much more 
satisfactory basis for all our calculations. 

The distribution defined by the product of the selection func- 
tion and the prior has another use. This distribution can be used as 
in equation l |17| l to generate a sample of pseudodata, which can then 
be scattered by Gaussian 'observational errors'. The distributions 
of the various observables of this pseudodata can then be compared 
with those of the true data set that one wishes to mimic, in order to 
assess the accuracy of the different factors one has included. Thus, 
if one is confident of the selection function (as will often be the 
case), one can adapt the prior until a good fit is found between the 
pseudodata and the real data. Once this has been achieved, this op- 
timized prior can then be used in the analysis of the true data with 
a high degree of confidence. 

In this work the form of the prior has been kept intentionally 
simple: we have dealt with a case in which the Galaxy is described 
by three stellar populations. Depending on the nature of the sam- 
ple being analysed, the prior could take a large range of forms. 
An interesting case that we have not considered is when the data 
include kinematic quantities. Then the prior would include a full 
phase-space distribution function for each stellar population, and 
this enrichment of the prior information that is brought to bear on 
the data would further reduce the uncertainties in estimated dis- 
tances, metallicities, etc. Indeed the power of the Bayesian method 
described in this paper lies in its generality; although it was ap- 
plied only to spectrophotometric data in Sections [3] and [4] nothing 
restricts the inclusion of other observables. 

Of course the correct form for the prior is unlikely to be known 
absolutely. One may however be able to specify it with a reason- 
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able degree of certainty ~ one might, for example, be able to give 
the prior in a parametric form with a certain pdf over the param- 
eters that specify it (such as disc scale lengths). The integration 
described in Section |2] can then also be performed over these pa- 
rameters, providing slightly less assumption-dependent values for 
the method's outputs. 

We have not explored the question of the joint probability dis- 
tributions of different parameters conditional upon the data for each 
star. Using the same technique as we describe but marginalizing 
only over selected dimensions of x, it would be possible to exam- 
ine such distributions and analyse covariances between different 
parameters for selected stars. This extension would be simple and 
one can envisage cases in which it could prove fruitful. 

Another consideration regarding the pdf for each star is that 
of bimodality. We have characterized each star's pdf in distance 
by its mean and dispersion; however an extension of this approach 
would be to search through each star's pdf in distance to identify 
any instances of multimodality. Some stars essentially present two 
distinct solutions for the given observables: one being a dwarf lying 
comparatively nearby, the other a more distant giant (as in the cases 
explored for the GCS in SectionQ. It could be fruitful to identify 
such degeneracies explicitly, and provide a mean and dispersion 
for each peak in the pdf. Starting from the algorithm described 
in this paper, one could identify any points in the assigned grid 
whose probabilities (by equationQ lie above some chosen thresh- 
old value, and then seek to identify sizeable islan ds in this subset of 
the sp ace by a friends-of-friends type algorithm jHuchra & Gelleil 
Il982h . This would provide a method of partitioning the pdf into 
discrete distributions, each of which could then be assigned an av- 
erage value and associated spread. Thus significant degeneracies in 
the pdf resulting from certain values of the observables could be 
identified and handled suitably. We hope to explore this avenue in 
a future paper. 

We have shown that by using all of the available information 
one can constrain the metallicities of stars to greater accuracy than 
the observations themselves. It is reasonable to expect that this 
same effect can be achieved for other observable quantities such 
as the surface gravity, for which the nominal observational errors 
are often sizeable. In this manner one can use the small errors on 
certain observables (such as apparent magnitudes) to shrink con- 
servative errors on less well-constrained values. So long as one has 
confidence in the stellar models used, this promises to be a power- 
ful technique for survey analysis. It should be noted that this tech- 
nique can only realistically be applied once - 'looping', by rerun- 
ning the analysis using the new values and errors, is prohibited - 
since its effectiveness comes from the application of independent 
prior knowledge to data with mildly overquoted errors. The mathe- 
matical formalism does not permit the subsequent application of a 
prior that is not independent of the data. 

One more point deserves mention. If one knows the distances 
to the stars in some specific sample (of a generic type that can be 
described by model isochrones), then one can in theory explore 
different forms for the prior in order to optimize one's estimation 
of their distances. This could then be fed back in to the algorithm 
in order to provide a more accurate estimation of distances to other 
stars. 

We are currently using this method to obtain distances to in 
excess of 250 000 RAVE stars and hope shortly to report the results 
of this work. 
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APPENDIX A: JACOBIAN QUESTIONS 

There is some confusion in the literature regarding the necessity 
or otherwise of a Jacobian term in the transition from considering 
the likelihood term as a function of x to considering it implicitly 
as a function of y(x) (such as when it is converted to the form 
G(y — y(x),CTy)). The Borel-Kolmogorov paradox (see, for ex- 
ample, |javnesll2003l ) cautions that this step should be taken with 
care. However, it is a simple matter to demonstrate that the term is 
unnecessary. 

If we consider, for a moment, y not to be a function of x but 
rather to be described by a pdf on x, we can expand equation Q: 



^P{S\y,x,ay)[ /p(y|y',x,CTjp(y'|x)dy' p(x); (A2) 



but then we can express the true functional dependence of y' by 



P{^\y,o-viS) oc P{S\y,x,cry) p{y\x, ay) p{x) 



(Al) 





(A3) 



leading directly to 



p{x\y,o-v,S) oc P(S'|y,x,cry)p(y|y(x),CTj,)p(x), 



(A4) 



which contains no Jacobian. 



